Simplify `slice::Iter::next` enough that it inlines #136771

scottmcm · 2025-02-09T11:00:05Z

Inspired by this zulip conversation: https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990

~~Draft for now because it needs #136735 to get the codegen tests to pass.~~

rustbot · 2025-02-09T11:00:13Z

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

scottmcm · 2025-02-09T11:05:21Z

Let's see whether it actually improves things:
@bors try @rust-timer queue

bors · 2025-02-09T11:06:31Z

⌛ Trying commit f7970b3 with merge 30df00c...

Simplify `slice::Iter::next` enough that it inlines Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990> Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.

bors · 2025-02-09T12:58:41Z

☀️ Try build successful - checks-actions
Build commit: 30df00c (30df00cd8218095deb80cc6b913de02f5ae4a5b0)

rust-timer · 2025-02-09T15:26:04Z

Finished benchmarking commit (30df00c): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.4%]	1
Regressions ❌ (secondary)	9.4%	[9.4%, 9.4%]	1
Improvements ✅ (primary)	-0.6%	[-2.2%, -0.1%]	210
Improvements ✅ (secondary)	-0.5%	[-1.1%, -0.1%]	131
All ❌✅ (primary)	-0.6%	[-2.2%, 0.4%]	211

Max RSS (memory usage)

Results (primary -2.2%, secondary -2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.4%	[1.7%, 2.8%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.4%	[-8.5%, -2.4%]	6
Improvements ✅ (secondary)	-2.5%	[-2.5%, -2.5%]	1
All ❌✅ (primary)	-2.2%	[-8.5%, 2.8%]	9

Cycles

Results (primary -1.0%, secondary 1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.6%	[4.6%, 4.6%]	1
Improvements ✅ (primary)	-1.0%	[-1.1%, -0.9%]	2
Improvements ✅ (secondary)	-1.1%	[-1.1%, -1.1%]	1
All ❌✅ (primary)	-1.0%	[-1.1%, -0.9%]	2

Binary size

Results (primary 0.1%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 5.0%]	28
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.5%, -0.0%]	45
Improvements ✅ (secondary)	-0.1%	[-1.3%, -0.0%]	48
All ❌✅ (primary)	0.1%	[-0.5%, 5.0%]	73

Bootstrap: 780.488s -> 778.559s (-0.25%)
Artifact size: 329.04 MiB -> 329.26 MiB (0.06%)

scottmcm · 2025-02-11T09:53:21Z

tests/codegen/slice-iter-nonnull.rs

 pub fn slice_iter_next<'a>(it: &mut std::slice::Iter<'a, u32>) -> Option<&'a u32> {
-    // CHECK: %[[ENDP:.+]] = getelementptr inbounds{{( nuw)?}} i8, ptr %it, {{i32 4|i64 8}}
-    // CHECK: %[[END:.+]] = load ptr, ptr %[[ENDP]]
+    // CHECK: %[[START:.+]] = load ptr, ptr %it,


Rebased atop the transmute-gives-asserts change; codegen tests should be passing now with just this trivial change that it's loading the start pointer first instead of the end pointer first.

scottmcm · 2025-02-11T09:55:21Z

tests/mir-opt/pre-codegen/slice_iter.enumerated_loop.PreCodegen.after.panic-abort.mir

    let mut _0: ();
-    let mut _11: std::slice::Iter<'_, T>;
-    let mut _12: std::iter::Enumerate<std::slice::Iter<'_, T>>;
-    let mut _13: std::iter::Enumerate<std::slice::Iter<'_, T>>;


Nice to see that the Enumerate iterators get completely SRoAed even just in MIR, with this!

scottmcm · 2025-02-11T09:56:09Z

Just to check that having the assumes in the LLVM-IR doesn't somehow lose all of the gains:
@bors try @rust-timer queue

bors · 2025-02-11T09:57:24Z

⌛ Trying commit 85deb4d with merge eaf73cd...

Simplify `slice::Iter::next` enough that it inlines Inspired by this zulip conversation: <https://rust-lang.zulipchat.com/#narrow/channel/189540-t-compiler.2Fwg-mir-opt/topic/Feedback.20on.20a.20MIR.20optimization.20idea/near/498579990> Draft for now because it needs rust-lang#136735 to get the codegen tests to pass.

bors · 2025-02-11T11:54:12Z

☀️ Try build successful - checks-actions
Build commit: eaf73cd (eaf73cd9c4fc608d616056b232ccb9cbb8df5679)

rust-timer · 2025-02-11T13:11:22Z

Finished benchmarking commit (eaf73cd): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.5%]	2
Regressions ❌ (secondary)	9.3%	[9.3%, 9.3%]	1
Improvements ✅ (primary)	-0.6%	[-2.2%, -0.1%]	196
Improvements ✅ (secondary)	-0.5%	[-2.4%, -0.1%]	101
All ❌✅ (primary)	-0.6%	[-2.2%, 0.5%]	198

Max RSS (memory usage)

Results (primary -1.8%, secondary -3.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[2.1%, 3.3%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.9%	[-8.7%, -0.7%]	13
Improvements ✅ (secondary)	-3.2%	[-7.1%, -2.0%]	29
All ❌✅ (primary)	-1.8%	[-8.7%, 3.3%]	16

Cycles

Results (primary -1.4%, secondary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.0%	[2.1%, 5.4%]	5
Improvements ✅ (primary)	-1.4%	[-1.8%, -1.0%]	3
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	-1.4%	[-1.8%, -1.0%]	3

Binary size

Results (primary 0.0%, secondary -0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 3.9%]	38
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-1.8%, -0.1%]	50
Improvements ✅ (secondary)	-0.2%	[-2.0%, -0.0%]	83
All ❌✅ (primary)	0.0%	[-1.8%, 3.9%]	88

Bootstrap: 785.339s -> 785.217s (-0.02%)
Artifact size: 348.32 MiB -> 348.38 MiB (0.02%)

scottmcm · 2025-02-14T18:41:02Z

With #136735 having landed, this is good for a review now.

@rustbot ready

The one large regression is secondary, and looks to be one of the codegen units just having way more to do in LLVM:

library/core/src/slice/iter/macros.rs

the8472 · 2025-02-14T22:42:17Z

library/core/src/slice/iter/macros.rs

-                // safe since we check if the iterator is empty first.
+                let ptr = self.ptr;
+                let end_or_len = self.end_or_len;
+                // SAFETY: Type invariants.


The safety comment is a bit too sloppy imo. At least it should say something like "same as above" if you want to avoid repetition. Or maybe split it into two unsafe blocks, one for each arm.

Yeah, true.

Weirdly when I added tighter-scoped unsafe blocks it stopped inlining (and I even rebuilt to check because that's so strange), but I added more specific comments inside a bigger block.

…ffset` Probably reasonable anyway since it more obviously drops provenance.

This adds a few more statements to `next`, but optimizes better in the loops (saving 2 blocks in `forward_loop`, for example)

joboet

This looks great, and the benchmarks speak for themselves – I think we can justify the one regression.

I have one silly nit, which you are welcome to ignore. Otherwise, r=me

library/core/src/slice/iter/macros.rs

scottmcm · 2025-02-20T17:36:26Z

@bors r=joboet

bors · 2025-02-20T17:36:30Z

📌 Commit 7add358 has been approved by joboet

It is now in the queue for this repository.

bors · 2025-02-20T18:20:43Z

⌛ Testing commit 7add358 with merge f04bbc6...

bors · 2025-02-20T21:50:25Z

☀️ Test successful - checks-actions
Approved by: joboet
Pushing f04bbc6 to master...

rust-timer · 2025-02-20T23:07:25Z

Finished benchmarking commit (f04bbc6): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

If the regression was expected or you think it can be justified,
please write a comment with sufficient written justification, and add
@rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
If you think that you know of a way to resolve the regression, try to create
a new PR with a fix for the regression.
If you do not understand the regression or you think that it is just noise,
you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.4%]	2
Regressions ❌ (secondary)	8.8%	[8.8%, 8.8%]	1
Improvements ✅ (primary)	-0.4%	[-1.4%, -0.1%]	123
Improvements ✅ (secondary)	-0.5%	[-2.3%, -0.1%]	70
All ❌✅ (primary)	-0.4%	[-1.4%, 0.4%]	125

Max RSS (memory usage)

Results (primary -2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	7.3%	[5.5%, 9.0%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-5.8%	[-9.1%, -1.8%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.1%	[-9.1%, 9.0%]	7

Cycles

Results (primary -1.2%, secondary 0.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.8%	[3.8%, 3.8%]	1
Improvements ✅ (primary)	-1.2%	[-2.3%, -0.8%]	8
Improvements ✅ (secondary)	-2.0%	[-2.0%, -2.0%]	1
All ❌✅ (primary)	-1.2%	[-2.3%, -0.8%]	8

Binary size

Results (primary -0.1%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.0%, 0.8%]	10
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.4%, -0.0%]	58
Improvements ✅ (secondary)	-0.3%	[-1.6%, -0.0%]	11
All ❌✅ (primary)	-0.1%	[-0.4%, 0.8%]	68

Bootstrap: 774.026s -> 774.477s (0.06%)
Artifact size: 360.27 MiB -> 361.04 MiB (0.21%)

rylev · 2025-02-25T20:03:14Z

Perf improvements vastly outweigh the regressions

@rustbot label: +perf-regression-triaged

rustbot assigned joboet Feb 9, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 9, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 9, 2025

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 9, 2025

FractalFir mentioned this pull request Feb 9, 2025

[perf experiment] A MIR pass dedicated to optimizing common iterators #136745

Closed

scottmcm mentioned this pull request Feb 11, 2025

transmute should also assume non-null pointers #136735

Merged

scottmcm force-pushed the poke-slice-iter-next branch from f7970b3 to 85deb4d Compare February 11, 2025 09:51

scottmcm removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 11, 2025

scottmcm commented Feb 11, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2025

This comment has been minimized.

Sign in to view

scottmcm reopened this Feb 14, 2025

scottmcm marked this pull request as ready for review February 14, 2025 18:40

the8472 reviewed Feb 14, 2025

View reviewed changes

library/core/src/slice/iter/macros.rs Show resolved Hide resolved

the8472 reviewed Feb 14, 2025

View reviewed changes

scottmcm added 4 commits February 14, 2025 22:24

Simplify slice::Iter::next enough that it inlines

aede8f5

Save another BB by using SubUnchecked instead of a call to `arith_o…

3a62c70

…ffset` Probably reasonable anyway since it more obviously drops provenance.

Go back to Some instead of transmuting to it.

39118d6

This adds a few more statements to `next`, but optimizes better in the loops (saving 2 blocks in `forward_loop`, for example)

Add real safety comments

7add358

scottmcm force-pushed the poke-slice-iter-next branch from 4ddbbbd to 7add358 Compare February 15, 2025 07:03

joboet approved these changes Feb 20, 2025

View reviewed changes

library/core/src/slice/iter/macros.rs Show resolved Hide resolved

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 20, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 20, 2025

bors merged commit f04bbc6 into rust-lang:master Feb 20, 2025
7 checks passed

rustbot added this to the 1.87.0 milestone Feb 20, 2025

scottmcm deleted the poke-slice-iter-next branch February 21, 2025 05:48

rustbot added the perf-regression-triaged The performance regression has been triaged. label Feb 25, 2025

carolynzech mentioned this pull request Mar 3, 2025

Update subtree/library to 2025-02-10 model-checking/verify-rust-std#262

Merged

Simplify slice::Iter::next enough that it inlines #136771

Simplify slice::Iter::next enough that it inlines #136771

Uh oh!

Conversation

scottmcm commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Feb 9, 2025

Uh oh!

scottmcm commented Feb 9, 2025

Uh oh!

This comment has been minimized.

bors commented Feb 9, 2025

Uh oh!

This comment has been minimized.

bors commented Feb 9, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Feb 9, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottmcm commented Feb 11, 2025

Uh oh!

This comment has been minimized.

bors commented Feb 11, 2025

Uh oh!

bors commented Feb 11, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Feb 11, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

scottmcm commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joboet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scottmcm commented Feb 20, 2025

Uh oh!

bors commented Feb 20, 2025

Uh oh!

bors commented Feb 20, 2025

Uh oh!

bors commented Feb 20, 2025

Uh oh!

Uh oh!

rust-timer commented Feb 20, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

rylev commented Feb 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Simplify `slice::Iter::next` enough that it inlines #136771

Simplify `slice::Iter::next` enough that it inlines #136771

scottmcm commented Feb 9, 2025 •

edited

Loading

scottmcm commented Feb 14, 2025 •

edited

Loading